Introduction to Grain Size Classification 📚

This classification is based on recognizing the parallelism in a program to be executed on a multiprocessor system. The idea is to identify the sub-tasks or instructions in a program that can be executed in parallel.

🔄Parallel Execution Example

For example, there are 3 statements in a program and statements S1 and S2 can be exchanged. That means, these are not sequential as shown in the figure. Then S1 and S2 can be executed in parallel.

Parallel Execution for S1 and S2
Sequential Execution:
S1: a = b + c
S2: d = e + f
S3: g = a + d

Parallel Execution:
S1: a = b + c    S2: d = e + f
      ↓                  ↓
S3: g = a + d

⚖️Factors Affecting Parallelism

But it is not sufficient to check for the parallelism between statements or processes in a program. The decision of parallelism also depends on the following factors:

🖥️

Architectural Features

Number and types of processors available

💾

Memory Organization

How memory is structured and accessed

🔗

Dependencies

Data, control, and resource dependencies

Parallelism Conditions 🔗

As mentioned earlier, parallel computing necessitates that the segments to be run concurrently must be autonomous of one another. Therefore, before implementing parallelism, all the prerequisites of parallelism between the segments need to be examined.

📊Dependency Relations

In this part, we talk about three kinds of dependency circumstances between the segments:

💾

Data Dependency

When two or more commands use the same information

🎛️

Control Dependency

Dependencies within control structures

🔧

Resource Dependency

When instructions utilize the same shared resource

💾Data Dependency

Data dependency refers to the condition where two or more commands use the same information. The following kinds of data dependencies are identified:

➡️

Flow Dependence

If instruction I2 follows I1 and output of I1 becomes input of I2, then I2 is said to be flow dependent on I1

⬅️

Anti-dependence

When instruction I2 follows I1 such that output of I2 overlaps with the input of I1 on the same data

🔄

Output Dependence

When output of the two instructions I1 and I2 overlap on the same data

📁

I/O Dependence

When read and write operations by two instructions are invoked on the same file

💻Data Dependency Example

I1: a = b
I2: c = a + d
I3: a = c

This program segment contains instructions I1, I2, and I3 that have various dependencies:

  • I1 and I2 are flow dependent because I1 generates variable a as output, which is then used by I2 as input
  • I2 and I3 are anti-dependent since I3 generates variable a but I2 uses it, and I2 comes before I3 in the sequence
  • I3 is flow dependent on I2 due to variable c
  • I3 and I1 are output dependent because both instructions generate variable a

🎛️Control Dependence

Instructions or segments in a program often contain control structures. As a result, dependency among the statements can also occur within control structures. However, the order in which instructions in control structures will execute is not known until run time.

For (i = 1; i <= n; i++) {
    if (x[i - 1] == 0)
        x[i] = 0
    else
        x[i] = 1;
}

In the above control structure, the successive iterations are dependent on one another because the value of x[i] in iteration i depends on the value of x[i-1] from the previous iteration.

🔧Resource Dependence

The similarity between the instructions can also be influenced because of the shared resources. If two instructions are utilizing the same shared resource then there is a resource dependency situation.

🧮

ALU Dependency

When floating point units or registers are shared

💾

Storage Dependency

When memory is being shared

Bernstein Conditions for Detection of Parallelism 📐

For execution of instructions or block of instructions in parallel, it should be ensured that the instructions are independent of each other. These instructions can be data dependent / control dependent / resource dependent on each other. Here we consider only data dependency among the statements for taking decisions of parallel execution.

📝Bernstein's Sets

A.J. Bernstein has elaborated the work of data dependency and derived some conditions based on which we can decide the parallelism of instructions or processes. Bernstein conditions are based on the following two sets of variables:

📖

Read Set (RI)

The input set that consists of memory locations read by the statement of instruction I1

✍️

Write Set (WI)

The output set that consists of memory locations written into by instruction I1

The sets RI and WI are not disjoint as the same locations are used for reading and writing by SI.

📐Bernstein Parallelism Conditions

The following are Bernstein Parallelism conditions which are used to determine whether statements are parallel or not:

1️⃣

First Condition

Locations in R1 from which S1 reads and the locations W2 onto which S2 writes must be mutually exclusive: R1 ∩ W2 = φ

2️⃣

Second Condition

Locations in R2 from which S2 reads and the locations W1 onto which S1 writes must be mutually exclusive: R2 ∩ W1 = φ

3️⃣

Third Condition

The memory locations W1 and W2 onto which S1 and S2 write, should not overlap: W1 ∩ W2 = φ

💻Bernstein Conditions Example

To show the operation of Bernstein's conditions, consider the following instructions of a sequential program:

I1: x = (a + b) / (a * b)
I2: y = (b + c) * d
I3: z = x² + (a * e)

Now, the read set and write set of I1, I2 and I3 are as follows:

📖

I1 Sets

R1 = {a, b}
W1 = {x}

📖

I2 Sets

R2 = {b, c, d}
W2 = {y}

📖

I3 Sets

R3 = {x, a, e}
W3 = {z}

🔍Checking Parallelism

Now let us find out whether I1 and I2 are parallel or not:

  • R1 ∩ W2 = {a, b} ∩ {y} = φ ✓
  • R2 ∩ W1 = {b, c, d} ∩ {x} = φ ✓
  • W1 ∩ W2 = {x} ∩ {y} = φ ✓

That means I1 and I2 are independent of each other and can be executed in parallel.

Similarly for I1 || I3:

  • R1 ∩ W3 = {a, b} ∩ {z} = φ ✓
  • R3 ∩ W1 = {x, a, e} ∩ {x} = {x} ≠ φ ✗
  • W1 ∩ W3 = {x} ∩ {z} = φ ✓

Hence I1 and I3 are not independent of each other and cannot be executed in parallel.

For I2 || I3:

  • R2 ∩ W3 = {b, c, d} ∩ {z} = φ ✓
  • R3 ∩ W2 = {x, a, e} ∩ {y} = φ ✓
  • W3 ∩ W2 = {z} ∩ {y} = φ ✓

Hence, I2 and I3 are independent of each other and can be executed in parallel.

Parallelism Results
I1 || I2

Can run in parallel

I1 || I3

Cannot run in parallel

I2 || I3

Can run in parallel

Parallelism Based On Grain Size 🌾

📏Definition of Grain Size

Grain size or Granularity is a measure which determines how much computation is involved in a process. Grain size is determined by counting the number of instructions in a program segment.

📊Types of Grain Sizes

🔬
Fine Grain

Contains approximately less than 20 instructions

⚖️
Medium Grain

Contains approximately less than 500 instructions

🏔️
Coarse Grain

Contains approximately greater than or equal to one thousand instructions

Types of Grain Sizes
Fine Grain Example:
a = b + c
d = e + f
g = a + d

Medium Grain Example:
function calculateSum(arr) {
    let sum = 0;
    for (let i = 0; i < arr.length; i++) {
        sum += arr[i];
    }
    return sum;
}

Coarse Grain Example:
function processImage(image) {
    // Image processing algorithm
    // Hundreds or thousands of instructions
    return processedImage;
}

🔄Parallelism Levels

According to the grain sizes, parallelism in a program can be categorized into different levels. These parallelism levels make up a hierarchy where processes become finer-grained at lower levels. As the level increases, the degree of parallelism decreases.

📝

Instruction Level

The most basic level with the highest degree of parallelism. Fine grain size with just a few instructions per grain.

🔄

Loop Level

Involves parallelizing iterative loop instructions. Also fine grain size. Simple loops are easy to parallelize.

🧩

Procedure Level

Consists of procedures, subroutines or subprograms. Medium grain size with thousands of instructions per procedure.

📚

Program Level

The highest level, consisting of independent programs. Coarse grain size with tens of thousands of instructions per program.

📊Parallelism Levels Characteristics

Parallelism Level Grain Size Characteristics Implementation
Instruction Level Fine Highest degree of parallelism, more overhead Hardware (superscalar, VLIW)
Loop Level Fine Simple loops easy to parallelize, recursive loops difficult Compilers can achieve automatically
Procedure Level Medium Programmers have exploited, compilers not yet
Program Level Coarse Tens of thousands of instructions, independent programs Time sharing, operating system

Parallelism Levels and System Types 🏗️

📊Relation Between Grain Sizes and Parallelism

Grain Size Parallelism Level Typical System Type
Fine Grain Instruction or Loop Level SIMD organization
Medium Grain Procedure or Subprogram Level Loosely coupled systems
Coarse Grain Program Level Tightly coupled or shared memory multiprocessors (e.g., Cray Y-MP)

🖥️System Types for Different Grain Sizes

🔗

Tightly Coupled Systems

Typically used for coarse grain parallelism. Examples include shared memory multiprocessors like the Cray Y-MP.

📦

Loosely Coupled Systems

Utilized to execute medium grain program segments. Distributed memory systems fall into this category.

🔀

SIMD Organization

Fine grain parallelism has been seen in the SIMD (Single Instruction, Multiple Data) organization of computers.

💡Practical Considerations

Communication Overhead

Fine grain parallelism requires more communication and synchronization overhead

📈

Scalability

Coarse grain parallelism is generally more scalable to larger systems

👨‍💻

Programming Complexity

Fine grain parallelism is often more complex to program effectively

🔄Parallelism Hierarchy

Parallelism Levels Hierarchy
Program Level (Coarse Grain) - Highest level, lowest parallelism degree 📚
Procedure Level (Medium Grain) - Medium parallelism degree 🧩
Loop Level (Fine Grain) - High parallelism degree 🔄
Instruction Level (Fine Grain) - Highest parallelism degree, lowest level 📝

Summary 📝

📚Classification Systems Covered

🔄

Flynn's Classification

Based on instruction and data streams (SISD, SIMD, MISD, MIMD)

🏗️

Handler's Classification

Categorizes computers at three distinct tiers: PCU, ALU, and BLC

🔗

Structural Classification

Tightly coupled (shared memory) and loosely coupled (distributed memory) systems

🌾

Grain Size Classification

Based on the amount of computation in a process (fine, medium, coarse)

🔍Key Concepts in Grain Size Classification

🔗

Dependencies

Data, control, and resource dependencies affect parallelism

📐

Bernstein Conditions

Mathematical conditions to determine if instructions can run in parallel

📏

Grain Sizes

Fine (<20 instructions), Medium (<500 instructions), Coarse (≥1000 instructions)

🏗️

Parallelism Levels

Instruction, Loop, Procedure, and Program levels with decreasing parallelism degree

💡Final Thoughts

Understanding grain size classification is essential for designing efficient parallel computing systems. The choice of grain size affects the system's performance, scalability, and programming complexity. Fine-grained parallelism offers high parallelism but with more overhead, while coarse-grained parallelism is easier to implement but may not utilize all available resources optimally.

By analyzing dependencies using Bernstein conditions and selecting the appropriate grain size for the target architecture, we can design parallel systems that effectively balance parallelism with overhead, leading to optimal performance for a wide range of applications.